ggbetweenstats(
data = iris,
x = Species,
y = Sepal.Length,
title = "Distribution of sepal length across Iris species"
)Indrajeet Patil
{ggstatsplot}?Current CRAN package count >23,000
ggstatsplot provides
📊 information-rich plots with statistical details
📝 suitable for faster (exploratory) data analysis and reporting
Graphical summaries can reveal problems not visible from numerical statistics.
The grammar of graphics is a powerful framework (Wilkinson, 2011) and can help you make any graphics fitting your specific data visualization needs! But…
Quality of Life (QoL) improvements with ggstatsplot
Provide ready-made plots with defaults following the best practices in statistical reporting and data visualization.
In a typical exploratory data analysis workflow, data visualization and statistical modeling are two different phases: visualization informs modeling, and modeling can suggest a different visualization, and so on and so forth.
Central idea of ggstatsplot
Simple: combine these two phases into one!
…but we will come back to that later 📌
Let’s get started first!
Package available for installation on CRAN and GitHub:
| Type | Command |
|---|---|
| Release | install.packages("ggstatsplot") |
| Development | pak::pak("IndrajeetPatil/ggstatsplot") |
ggbetweenstats()For between-group comparisons
Important
✏️ Defaults
Statistical approaches available
Standard approach
Pearson’s correlation test revealed that, across 142 participants, variable x was negatively correlated with variable y: \(t(140)=-0.76, p=.446\). The effect size \((r=-0.06, 95\% CI [-.23,.10])\) was small, as per Cohen’s (1988) conventions. The Bayes Factor for the same analysis revealed that the data were 5.81 times more probable under the null hypothesis as compared to the alternative hypothesis. This can be considered moderate evidence (Jeffreys, 1961) in favor of the null hypothesis (absence of any correlation between x and y).
ggstatsplot approach
Parametric
Hunting for packages
📦 for inferential statistics ({stats})
📦 computing effect size + CIs (effectsize)
📦 for descriptive statistics (skimr)
📦 pairwise comparisons (multcomp)
📦 Bayesian hypothesis testing (BayesFactor)
📦 Bayesian estimation (bayestestR)
📦 …
Inconsistent APIs
🤔 accepts data frame, vector, matrix?
🤔 long/wide format data?
🤔 works with NAs?
🤔 returns data frame, vector, matrix?
🤔 works with tibbles?
🤔 has all necessary details?
🤔 …
“What if I don’t like the default plots?” 🤔
Aesthetic preferences not an excuse to avoid ggstatsplot! 😻 Any ggplot theme or palette can be used.
N.B. The default palette is colorblind-friendly.
{ggplot2} 🛠You can modify ggstatsplot plots further using ggplot2 functions. 🎉
Things to be wary of
Promotes mindless application of statistical tests.
Easy-to-use software can lead to misuse.
Things that will pull you in
Each commit must pass many QA checks:
CI Checks (GitHub Actions)
Benefits of the ggstatsplot approach
ggstatsplot, a package that combines data visualization and statistical analysis in a single step, is a powerful tool that:
Source code for these slides can be found on GitHub.
If you are interested in good programming and software development practices, check out my other slide decks.
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31)
os macOS Sequoia 15.1
system aarch64, darwin20
hostname MacBookAir.fritz.box
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Berlin
date 2024-11-10
pandoc 3.5 @ /usr/local/bin/ (via rmarkdown)
quarto 1.6.33 @ /usr/local/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
base * 4.4.2 2024-11-01 [2] local
BayesFactor 0.9.12-4.7 2024-01-24 [2] CRAN (R 4.4.0)
bayestestR 0.15.0 2024-10-17 [1] CRAN (R 4.4.1)
bitops 1.0-8 2024-07-29 [2] CRAN (R 4.4.1)
BWStest 0.2.3 2023-10-10 [2] CRAN (R 4.4.0)
cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0)
cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.0)
coda 0.19-4.1 2024-01-31 [2] CRAN (R 4.4.0)
codetools 0.2-20 2024-03-31 [2] CRAN (R 4.4.2)
colorspace 2.1-1 2024-07-26 [2] CRAN (R 4.4.0)
compiler 4.4.2 2024-11-01 [2] local
correlation 0.8.6 2024-10-26 [1] CRAN (R 4.4.1)
cranlogs 2.1.1 2019-04-29 [2] RSPM (R 4.4.0)
curl 6.0.0 2024-11-05 [1] CRAN (R 4.4.1)
data.table 1.16.0 2024-08-27 [2] CRAN (R 4.4.1)
datasets * 4.4.2 2024-11-01 [2] local
datawizard 0.13.0 2024-10-05 [1] CRAN (R 4.4.1)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
dplyr 1.1.4 2023-11-17 [2] CRAN (R 4.4.0)
effectsize 0.8.9 2024-07-03 [1] CRAN (R 4.4.0)
emmeans 1.10.4 2024-08-21 [2] CRAN (R 4.4.1)
estimability 1.5.1 2024-05-12 [2] RSPM (R 4.4.0)
evaluate 1.0.1 2024-10-10 [1] CRAN (R 4.4.1)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0)
farver 2.1.2 2024-05-13 [2] RSPM (R 4.4.0)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
fortunes 1.5-4 2016-12-29 [2] RSPM (R 4.4.0)
generics 0.1.3 2022-07-05 [2] CRAN (R 4.4.0)
ggiraph 0.8.10 2024-05-17 [2] RSPM (R 4.4.0)
ggiraphExtra 0.3.0 2020-10-06 [2] RSPM (R 4.4.0)
ggplot2 * 3.5.1 2024-04-23 [2] CRAN (R 4.4.0)
ggrepel 0.9.6 2024-09-07 [2] CRAN (R 4.4.1)
ggsignif 0.6.4 2022-10-13 [2] CRAN (R 4.4.0)
ggstatsplot * 0.12.5 2024-11-01 [1] CRAN (R 4.4.1)
ggthemes 5.1.0 2024-02-10 [2] CRAN (R 4.4.0)
glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1)
gmp 0.7-5 2024-08-23 [2] CRAN (R 4.4.1)
graphics * 4.4.2 2024-11-01 [2] local
grDevices * 4.4.2 2024-11-01 [2] local
grid 4.4.2 2024-11-01 [2] local
gtable 0.3.5 2024-04-22 [2] CRAN (R 4.4.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0)
httr 1.4.7 2023-08-15 [2] RSPM
insight 0.20.5 2024-10-02 [1] CRAN (R 4.4.1)
jsonlite 1.8.9 2024-09-20 [1] CRAN (R 4.4.1)
knitr 1.49 2024-11-08 [1] CRAN (R 4.4.2)
kSamples 1.2-10 2023-10-07 [2] CRAN (R 4.4.0)
labeling 0.4.3 2023-08-29 [2] CRAN (R 4.4.0)
lattice 0.22-6 2024-03-20 [2] CRAN (R 4.4.2)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
lubridate 1.9.3 2023-09-27 [2] RSPM
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
MASS 7.3-61 2024-06-13 [2] CRAN (R 4.4.2)
Matrix 1.7-1 2024-10-18 [2] CRAN (R 4.4.2)
MatrixModels 0.5-3 2023-11-06 [2] RSPM
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
methods * 4.4.2 2024-11-01 [2] local
mgcv 1.9-1 2023-12-21 [2] CRAN (R 4.4.2)
multcomp 1.4-26 2024-07-18 [2] CRAN (R 4.4.0)
multcompView 0.1-10 2024-03-08 [2] RSPM
munsell 0.5.1 2024-04-01 [2] CRAN (R 4.4.0)
mvtnorm 1.3-1 2024-09-03 [2] CRAN (R 4.4.1)
mycor 0.1.1 2018-04-10 [2] RSPM (R 4.4.0)
nlme 3.1-166 2024-08-14 [2] CRAN (R 4.4.2)
packageRank * 0.9.2 2024-08-01 [2] CRAN (R 4.4.0)
paletteer 1.6.0 2024-01-21 [2] CRAN (R 4.4.0)
parallel 4.4.2 2024-11-01 [2] local
parameters 0.23.0 2024-10-18 [1] CRAN (R 4.4.1)
patchwork 1.3.0 2024-09-16 [1] CRAN (R 4.4.1)
pbapply 1.7-2 2023-06-27 [2] CRAN (R 4.4.0)
performance 0.12.4 2024-10-18 [1] CRAN (R 4.4.1)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
pkgsearch 3.1.3 2023-12-10 [2] RSPM (R 4.4.0)
plyr 1.8.9 2023-10-02 [2] CRAN (R 4.4.0)
PMCMRplus 1.9.12 2024-09-08 [1] CRAN (R 4.4.1)
ppcor 1.1 2015-12-03 [2] RSPM (R 4.4.0)
prismatic 1.1.2 2024-04-10 [2] RSPM
purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.4.0)
R.oo 1.27.0 2024-11-01 [1] CRAN (R 4.4.1)
R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.4.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
RColorBrewer 1.1-3 2022-04-03 [2] CRAN (R 4.4.0)
Rcpp 1.0.13-1 2024-11-02 [1] CRAN (R 4.4.1)
RcppParallel 5.1.9 2024-08-19 [2] CRAN (R 4.4.1)
RCurl 1.98-1.16 2024-07-11 [2] CRAN (R 4.4.0)
rematch2 2.1.2 2020-05-01 [2] RSPM
reshape2 1.4.4 2020-04-09 [2] RSPM
rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0)
rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.4.1)
Rmpfr 0.9-5 2024-01-21 [2] RSPM
rstantools 2.4.0 2024-01-31 [2] RSPM
rstudioapi 0.17.1 2024-10-22 [1] CRAN (R 4.4.1)
sandwich 3.1-0 2023-12-11 [2] CRAN (R 4.4.0)
scales 1.3.0 2023-11-28 [2] CRAN (R 4.4.0)
sessioninfo 1.2.2.9000 2024-11-09 [1] Github (r-lib/sessioninfo@37c81af)
sjlabelled 1.2.0 2022-04-10 [2] RSPM (R 4.4.0)
sjmisc 2.8.10 2024-05-13 [2] RSPM (R 4.4.0)
snakecase 0.11.1 2023-08-27 [2] CRAN (R 4.4.0)
splines 4.4.2 2024-11-01 [2] local
stats * 4.4.2 2024-11-01 [2] local
statsExpressions 1.6.1 2024-10-31 [1] CRAN (R 4.4.1)
stringi 1.8.4 2024-05-06 [2] RSPM (R 4.4.0)
stringr 1.5.1 2023-11-14 [2] RSPM
sugrrants 0.2.9 2024-03-12 [2] RSPM (R 4.4.0)
SuppDists 1.1-9.8 2024-09-03 [2] CRAN (R 4.4.1)
survival 3.7-0 2024-06-05 [2] CRAN (R 4.4.2)
systemfonts 1.1.0 2024-05-15 [1] CRAN (R 4.4.0)
TH.data 1.1-2 2023-04-17 [2] CRAN (R 4.4.0)
tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
tidyr 1.3.1 2024-01-24 [2] CRAN (R 4.4.0)
tidyselect 1.2.1 2024-03-11 [2] CRAN (R 4.4.0)
timechange 0.3.0 2024-01-18 [2] RSPM
tools 4.4.2 2024-11-01 [2] local
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
utils * 4.4.2 2024-11-01 [2] local
uuid 1.2-1 2024-07-29 [2] CRAN (R 4.4.1)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.1)
xfun 0.49 2024-10-31 [1] CRAN (R 4.4.1)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.1)
zeallot 0.1.0 2018-01-28 [2] RSPM
zoo 1.8-12 2023-04-13 [2] CRAN (R 4.4.0)
[1] /Users/indrajeetpatil/Library/R/arm64/4.4/library
[2] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
* ── Packages attached to the search path.
──────────────────────────────────────────────────────────────────────────────
ggwithinstats()Hypothesis about group differences: repeated measures design
Important
✏️ Defaults
Statistical approaches available
gghistostats()Distribution of a numeric variable
Important
✏️ Defaults
Statistical approaches available
ggdotplotstats()Labeled numeric variable
Important
✏️ Defaults
Statistical approaches available
ggscatterstats()Hypothesis about correlation: Two numeric variables
ggcorrmat()Hypothesis about correlation: Multiple numeric variables
ggpiestats()Hypothesis about composition of categorical variables
ggbarstats()Hypothesis about composition of categorical variables
ggcoefstats()Hypothesis about regression coefficients
Important
✏️ Defaults
Supports all regression models supported in {easystats} ecosystem.
Meta-analysis is also supported!
Iterating over a grouping variable
{ggstatsplot} benefitsNote
| Functions | Description | Parametric | Non-parametric | Robust | Bayesian |
|---|---|---|---|---|---|
ggbetweenstats() |
Between group comparisons | ✅ | ✅ | ✅ | ✅ |
ggwithinstats() |
Within group comparisons | ✅ | ✅ | ✅ | ✅ |
gghistostats(), ggdotplotstats()
|
Distribution of a numeric variable | ✅ | ✅ | ✅ | ✅ |
ggcorrmat() |
Correlation matrix | ✅ | ✅ | ✅ | ✅ |
ggscatterstats() |
Correlation between two variables | ✅ | ✅ | ✅ | ✅ |
ggpiestats(), ggbarstats()
|
Association between categorical variables | ✅ | NA |
NA |
✅ |
ggpiestats(), ggbarstats()
|
Equal proportions for categorical variable levels | ✅ | NA |
NA |
✅ |
ggcoefstats() |
Regression modeling | ✅ | ✅ | ✅ | ✅ |
ggcoefstats() |
Random-effects meta-analysis | ✅ | NA |
✅ | ✅ |
“half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion”
Since the plot and the statistical analysis are yoked together, the chances of making an error in reporting the results are minimized.
No need to worry about updating figures and statistical details separately. 🔗
\(p > 0.05\): The null hypothesis (H0) can’t be rejected
But can it be accepted?! Null Hypothesis Significance Testing 🤫
“In 72% of cases, nonsignificant results were misinterpreted, in that the authors inferred that the effect was absent. A Bayesian reanalysis revealed that fewer than 5% of the nonsignificant findings provided strong evidence (i.e., \(BF_{01} > 10\)) in favor of the null hypothesis over the alternative hypothesis.”
Juxtaposing frequentist and Bayesian statistics for the same analysis helps to properly interpret the null results.
Minimal code needed (data, x, y): minimizes chances of error + tidy scripts. 💅
Disembodied figures stand on their own and are easy to evaluate. 🧐
More breathing room for theoretical discussion and other text. ✍
❌ an alternative to learning ggplot2
✅ the more you know ggplot2, the better you can modify the defaults to your liking)
❌ meant to be used in talks/presentations
✅ defaults too complicated for effectively communicating results in time-constrained presentation settings, e.g. conference talks)
❌ only relevant when used in publications
✅ not necessary; can also be useful only during exploratory phase